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Abstract —A new Bayesian approach to linear system iden¬ 
tification has been proposed in a series of recent papers. The 
main idea is to frame linear system identification as predictor 
estimation in an infinite dimensional space, with the aid of 
regularization/Bayesian techniques. This approach guarantees 
the identification of stable predictors based on the prediction 
error minimization. Unluckily, the stability of the predictors 
does not guarantee the stability of the impulse response of 
the system. In this paper we propose and compare various 
techniques to address this issue. Simulations results comparing 
these techniques will be provided. 


I. Introduction 

Recent approaches for linear system identification describe 
the unknown system directly in terms of impulse response, 
thus describing an infinite dimensional model class. Needless 
to say, this is not entirely free of difficulties, since an 
alternative way to control the model complexity, i.e., to 
face the so called-bias variance tradeoff [1], [2], need to 
be found. It has been shown in the recent literature that the 
apparatus of Reproducing Kernel Hilbert Spaces (RHKS) or, 
equivalently, Bayesian Statistics provide powerful tools to 
face this tradeoff. 

The paper [3] has shown how these infinite dimensional 
model classes can be used for identification of linear sys¬ 
tems in the framework of prediction error methods, leading 
naturally to stable predictors. Yet stability of the predictor 
model does not necessarily guarantee stability of the so called 
“forward” (or simulation) model. As a matter of fact, we 
faced this stability issue when performing identification on 
a real data set from EEG recordings. Physical insight in this 
case suggests that the transfer function describing the link 
between potentials in different brain locations are expected 
to be stable, while the identified models where not. 

Therefore, motivated by this real-world application, in this 
paper we shall tackle the problem of identifying stable (simu¬ 
lation) models when nonparametric prediction error methods 
[3] are used. We shall describe and compare, through an 
extensive simulation study, four possible solutions to this 
problem. 

The paper is structured as follows: Section [II] formu¬ 
lates the problem. Sections [Hl]|Vl introduce four different 
approaches to guarantee stability of the identified models. 


Experimental results are described in Section VI and con¬ 
clusions are drawn in Section lYlIl 

Notation: Given a matrix M, M T shall denote its trans¬ 
pose, er(M) will be its eigenvalues. If A(z) is a polynomial, 
<j(A(z)) will denote the set of roots of A(z). Given two 
discrete time jointly stationary stochastic process y(t) and 
z(t), the symbol E[y(f)|^(s), s < t] shall denote the linear 


minimum variance estimator (conditional expectation in the 
Gaussian case) of y(t) given the past ( s < t) history of z(t). 

II. Statement of the model stabilization problem 

We shall consider two jointly stationary discrete time zero 
mean stochastic processes {u(t}}, {y(t)}, t £ Z, respectively 
the “input” and “output” processes. 

As shown in [4], [5] under these assumptions there is an 
essentially unique representation of y(t) in terms of u(t) of 
the form 

y(t) = P{z)u(t) + H(z)e(t.) 

e{t) := y(t)-E[y(t)\y(s),u(s),s <t] 

where 

P(z) H(z) := ^ h k z~ k h 0 = 1 (2) 

fe=1 fc=o 

and H ( 2 ) is minimum-phase. This guarantees stability of the 
predictor y(t\t — 1) := E[j/(f)|y(s), s < t\: 

y(t\t - 1) = H(z)- 1 [( H{z ) - 1 )y(t) + P(z)u(t )] (3) 

In this paper we shall also assume that P(z) (and thus // (z)) 
are stable^ (i.e., analytic inside the open unit disc). 

Prediction error approaches to system identification [1], 
[2] are based on estimating the predictor model 

V(t\t ~ 1) = F(z)y(t) + G(z)u(t) 

F(z) = EZi hz~ k G(z) = Er=i 9kZ~ k 

Classic parametric methods [1], [2] start from a parametric 
description Pg(z) and Hg(z) of P{z) and H(z) in <0 
This parametrization is usually constrained (9 £ 0) so as 
to account for prior knowledge such as stability of Pg{z), 
Hg(z) and II 1 (z). This induces a natural parametrization 
of the predictor y(t \t— 1) which is thus denoted by yg(t\t— 1). 
Given a data set y := {2/(f)}t=i,..,T, u := {w(f)} t= i i .. i 7 ’, the 
parameters 9 are then estimated minimizing the squared loss 

]f-i)) 2 . (5) 

t —1 

More recently prediction error identification has been for¬ 
mulated in a nonparametric framework [3]. The main issue 
working in a nonparametric (possibly infinite dimensional) 
framework is that the problem of finding estimators f,g of 
/ := {fk}ke z+ and g ■= {g k }ke z+ from measurements y,u 
is an ill-posed inverse problem [6]. The main idea, borrowed 
from [7] is to minimize the prediction error 0 searching for 

^ote that in a feedback configuration P(z) is in principle allowed to be 
unstable provided there is a stabilizing feedback in action. 





{fk}ke z+ and {g k }k&+ in a suitable Reproducing Kernel 
Hilbert Space (RKHS) [8] which acts as a regularize^ also 
encoding notions of “stability” of the predictor (e.g. making 
sure that the estimated F(z) and G(z) are BIBO stable 
with probability one), see [3], [7] for details. Equivalently 
one can think that {/fcjfcez+ and {g k }kez+ are modeled as 
independent zero mean Gaussian Process [9] with a suitable 
covariance K(t,s ) = cov(f t , f s ) = cov(g t ,g s ) (the same as 
the Reproducing Kernel above). This covariance is usually 
parametrized by some unknown hyperparameters g, which 
will be made explicit in the notation using a subscript, 
e.g. K v and p v (f,g) = P v {f)p v {g)- Under the assumption 
that the innovation process is Gaussian and independent 
of / = {fk}ke z+ and g = {gk}ke z+, also the marginal 
Prj(y,u) and the posterior p v {f,g\y,u) are Gaussian, see 
[3] for details. The marginal density p v (y,u), also called 
marginal likelihood, can be used to estimate the unknown 
hyperparameter as: 


fj M L ■= arg max v p v (y , u). (6) 

Then, following the Empirical Bayes paradigm, estimators 
of / = {fk}ke z+ and g = {g k }kez.+ are then found 
from their posterior density p v (f, g\y, u) having fixed the 
hyperparameters to their estimated value g ml [3]: 

f [f\y, «], 9 ~^f,ML\9\y,u] (7) 


where E & ML [• | •] denotes conditional expection having fixed 
V = Vml- 

Unfortunately, BIBO stability of the impulse responses of 
{/fcjfcez+ and {g k } ke z+ does not guarantee BIBO stability 
of the estimates 


Hz) := 


G(z) 

l-Hz)' 


H(z) := 


1 


1 - F(z) 


( 8 ) 


of P{z) and H(z) in (jTJ). In fact, BIBO stability of the 
sequences {fk}ke z+ and {<7fc}fcez+ have no relation with 
stability of P(z) and H{z) which, if no cancellations occur, 
depends on the zeros of 1 — F(z) = 1 — ELi fkZ~ k . 

For practical purposes when estimating the predictor model 
Q, the impulse responses {/fc}fcez+ and {<?fc}fcez+ are trun¬ 
cated to a finite (yet arbitrarily large) p , so that we assume 

Hz) = ELi hz~ k , G(z) = ELi 9kZ~ k . 

Thus, the problem we consider in this paper, can be 
formulated as follows: 

Problem 1: Given y(t),u(t), t £ [1 ,T], find {f k }ke[i, P ] 
and {gfc}fcg[i, p ] so that P(z) and H(z) in |8]i are BIBO stable 
transfer functions. A sufficient generic0 condition for this to 
happen is that 


A(z) = z?( 1 - ELi hz~ k ) =zP~ [zP - 1 ... 1]/ 

/:= [/i,/ 2 ,...,/ p ] T 


(9) 


is stable, i.e., has all roots inside V := {z £ C : |^| < 1}. 

In the following we describe and compare three different 
techniques to achieve this aim. For each technique Problem[l] 


2 i.e., if no cancellations occur, which is generic for estimated impulse 
responses. 


is properly reformulated. In order to simplify the notation, in 
what follows, the input u will be dropped from the notation; 
therefore, for instance, we shall use p v (y) in lieu of p v (y, u). 


III. Stabilization via LMI constraint 


The first stabilization technique is based on formulating 
stability of the model ([8]) as a constraint on the eigenvalues 
of the companion matrix of A(z) in (|9). This constraint can 
be characterized in terms of Linear Matrix Inequalities (LMI) 
as discussed in [10], and used later on in [11] to enforce 
stable models in subspace identification, thus leading to: 

Problem 2 (Reformulation): Given a preliminary estimate 
/ : = [/i> /p] T - find a vector of coefficients / so that 


/ = arg mm 


f-f 


GO) 


where Tx> ■= {/ £ R p : |A| < 1 VA s.t. A( A) = 0 ,A(z) = 
z p — ... 1]/}, can be described by an LMI constraint 

as discussed below. 

It should be observed that the use of the 2-norm in © 
is entirely arbitrary and, in fact, considering some form 
of model approximation error (e.g. difference of output 
predictors) would be preferable. In addition, when / is 
the outcome of a preliminary estimation step, a principled 
solution would require accounting for the distribution of /. 

However, this brings in some technical difficulties related 
to the formulation of the quadratic problem, therefore, it is 
still subject of research. 


Formulation of the LMI constraint 

As shown in [10], a matrix F has all its eigenvalues in the 
LMI region D = {z £ C s.t. fv{z) > 0}, where fv{z) is 
an opportune polynomial matrix, if and only if there exists 
P = P T > 0 s.t. 


M(F,P) = / 2 ®P+ 


0 1 
0 0 


(FP) J + M >0(11) 


According to [11, Theorem 1], which presents small 
variations w.r.t the original central theorem in [10], we define 
the companion matrix of / as \D(/) £ K pxp . Therefore, 
using ( fTTj i, / is (Schur) stable if and only if 3P = P T > 0 
such that M(T'(/),P) > 0. 

Unfortunately M(T'(/),P) this is not linear in / and P 
since their product appears. Similarly to [11], this calls for 
a reparametrization of the constraint as follows: define the 
vector tb := Pf (so that / = P -1 ?/>), J := [0J o -il, and 
M{i!>,P) :=M(*(f),P) i.e., 


M(ip,P) =/ 2 ®P + 


0 1 
0 0 




( 12 ) 


which is linear in ip and P. Thus problem [2] can be refor¬ 
mulated as: 


ip, P = arg min 
f,P 


ip- P/b 


s.t. M(tp,P)> 0, Tr(P) = p, 


P = P T > 0 

( 13 ) 


















where the constraint Tr(P) = p is added to improve the 
numerical conditioning, see [ 11 ] for further details. 

The solution / of Problem I 2 I is finally computed as: 

/ = P (14) 

In the remaining of the paper the model P{z) obtained by 
plugging in | 8 ]i the estimators / and g obtained respectively 
from ( p~4] > and the Bayesian procedure in [3], will be called 
“LMI” model. 

IV. Stabilization via Penalty Function 

The second stabilization technique is formulated to act 
directly inside the Bayesian procedure. As briefly discussed 
in section [Tlj a crucial step of the Bayesian procedure is the 
estimation of the hyperparameter vector 77 through marginal 
likelihood optimization ([ 6 |. It is in principle possible to 
restrict the set of admissible hyperparameters to a subset Eg 
which lead to estimators 0 corresponding to stable models 
P(z) and H(z). This is not entirely trivial as the estimators 
(and thus the set Eg) depend on the measured data y, u. This 
leads to the following: 

Problem 3 (Reformulation): Estimate the hyperparame¬ 
ters 77 solving 

V = arg max p v (y) = arg min - In p v (y) (15) 

i)EH s 

to the set Eg = {ij\A(z) Stable }, i.e., the set of hyperpa¬ 
rameters which lead to stable models P(z), H(z). 

To force 77 £ Eg, we can add a penalty function to the 
criterion in which acts as a barrier to keep the estimate 
77 away from the set of hyperparameters 77 leading to an 
unstable A(z). In order to do so, we define A v {z) the 
polynomial A(z) in ([9]) built with the estimator 

f v :=E v [f\y,u], (16) 

and p v = max \o(A v (z))\. Next define the penalty function: 

J ^ {<*($-P v )) a {aS) a (17 ' ) 

where <5 > 1 is a scalar which defines the barrier, a is a 
positive scalar which adjust how steep the barrier is. 

As we can see in FigurefTIfunction ( fT7| diverges ( J(p v ) —> 
00 ) when p —> (5 and J(fif) —t 0 when p —> 0. Thus when 
<ID is added to the minimization problem ( fi~5j ), the solution 
p is pushed inside the stability region. The parameters a and 
6 are iteratively adjusted so as to guarantee that the final 
solution leads to a stable model, i.e. solves the constrained 
problem ( fl5| ). 

Notice that when a —> 0, J(p v ) gives no penalty for 
p v < 6 and infinite penalty for p v > 6. Elaborating upon 
the intuition above, it is easy to prove that the solution of 
Problem [3] can be found by the algorithm described below: 

Algorithm 1: 

1) Initialization: 

• Compute ? 7 q using 4® and set a = 1. 

• Compute the predictor impulse response f, l0 using 

then determine the associated A Vo (z), p Vo . 



Fig. 1. Representation of the penalty function J(prj). The red bullet 
represents the penalty function value associated to a specific p in an 
illustrative example. The blue (head filled) arrows show the effect of the 
penalty function on p , the black (head no-filled) arrows the effects of 
changing the parameters a and The blue dashed line represents the 
variation of J(pri) after a reduction of a. 


2) While p m > 1 

. Set S = p m { l + e) 

• Compute 

t]k = arg min — In p v (y) + J(p v ) (18) 
v 

and the associated p Vk 

• If the value of — In p Vk ( y) + J(p Vk ) is unchanged 
w.r.t. the k — 1 iteration, then perform the update: 
a = a — A a, 6 = 6 — A6 where Aa and A 6 are 
chosen sufficiently small 

3) Set a = e and 5 = 1. 

Finally, the solution of Problem [3] is given by: 

77 = argmin — In p v (y) + J v ( 77 ) (19) 

f = ^f)[f\y,u\, g = Efj[g\y,u\ (20) 

In the remaining of the paper the model obtained by 
using ( |20| will be called “ML + PF” model. 

Remark 1: Notice that the iterative procedure which up¬ 
dates 6 and a is needed because, in general, it is not 
guaranteed that one can find an initial value of 77 £ Eg. 
Note also that the set Eg is always non-empty provided 
the hyperparameter vector 77 includes a scaling factor for 
the Kernel, i.e., a non negative scalar which multiplies the 
Kernel matrix. In fact, if this is the case, there exist values 
of 77 which lead to f = 0 which, in turn leads to stable P(z) 
and H(z). 

V. Stabilization via Markov Chain Monte Carlo 

In this Section we shall present a MCMC approach which 
yields the so called full Bayes estimator of / and g, intro¬ 
ducing a (possibly non-informative) prior density [^ 75 ( 77 ) on 

’This may be a uniform distribution if the domain is compact. 











the hyperparameter vector p. In order to enforce the stability 
constraint we consider the “stable” posterior distribution 

ps(f,g\y) = Jp(y\f,g)ps{f,g\v)p{.v)dv ( 21 ) 

where ps(f,g\v) is the “truncated” Gaussian prior 


ps(f,g\v ) 


k v P v (f,g ) / : A(z) stable 

0 otherwise 


( 22 ) 


which, a priori, excludes all impulse responses / which lead 
to unstable A(z). Note that the constant k v in ( [22] ) equals 

k V ■= J^pifMdfdg ’ Where T ■= if\M Z ) Stable i- 
Unfortunately, the “stable” conditional 


/ f | x p(y\f,g)ps(f,g\ii) 

Ps{f,g\y,v) ■= - 7 —t- 

ps{y,g) 

is not Gaussian and, in addition, the integral in ( | 2 T| > cannot be 
computed in closed form. Therefore we tackle the problem 
using MCMC methods: 

Problem 4 (Reformulation): 

Obtain a sampling approximation of the “stable” posterior 
distribution ( | 2 Tj ). Compute from these samples the estimates 
f,§ in 0 and P, H in 0 which satisfy the stability 
constraint. This will be done computing sample posterior 
means as well as sample MAP. 

In order to sample from the stable posterior ( | 2 T| one can 
use a Metropolis-Hasting type of algorithm as in [12]. 

We have now to address two fundamental issues for this 
algorithm to be implementable, namely: 

(i) Design the proposal density Qf, g (-\-) 

(ii) Compute the posterior ps{f,g\y), U P to a constant 
multiplicative factoi^] 

A preliminary step for both items (i) and (ii) is the 
computation of a set of samples pi ~ p{p\y) from the 
posterior of the hyperparameters, without accounting for the 
stability constraint. 

In the next subsections we address these three issues. 


Sampling from the posterior density p(p\y) 

First, our aim is to draw points from the posterior density 
of r] given y. Notice that: 

f , \ Pr,(y)p(v) 

pvn\y) = / x — ( 23 ) 

p{y) 

where, as mentioned earlier on, p(rj) is assumed to be a non 
informative prior distribution, and p(y) is the normalization 
constant. The marginal density p v {y) of y given p can be 
computed in closed form, as discussed in [3] and is given by 

Pv(v) = ex P ln(det[27rS^]) - ^ T E~ 1 ?/j (24) 

where 

£„ = AK v A r + BKr,B T + a 2 1 (25) 

where a 2 := Var{e(t)} is the variance of the innovation 
process |T]i and A , B are matrices built with the past input- 
output data, see [3] for details. 


4 This is because only ratios of probabilities need to be computed. 


In order to obtain samples from ( [23| we implemented a 
Metropolis-Hasting algorithm, see e.g. [12]. We are using 
a symmetric proposal distribution q v (- 1 -) which describes 
a random walk in the hyperparameter space, whose mean 
is centered in the present value and its variance contains 
information about the local curvature of the target. To do so, 
let us define: 


V = argmin„-ln \p v (y)p(v)\ 

_ _ d 2 ln[p a (y)p(Tj)] (26) 

** — drjdri T 

that is the Hessian matrix computed in p. Thus we define 
grti'lp) = Af(p,pH~ 1 ) where 7 is a positive scalar chosen 
to obtain an acceptance probability in the MCMC algorithm 
around the 30% via a pilot analysis, see e.g. [13]. 

The acceptance rate of the MCMC results to be: 


a Vi = min 



Pr H (y)p( r li) \ 

Pm-i(y)p(vi-i) J 


Proposal density 

It is well known in the MCMC literature that an accurate 
choice of the proposal distribution may have a remarkable 
impact on the performance of the Markov Chain. In this 
paper we adopt a data-driven proposal computed from the 
posterior distribution disregarding the stability constraint. 
The algorithm we consider is based on the approximation 


p(f,g\y)= p{f,g\y,v)p{v\y)dp 


1 > . 

jf^Pm(f,g\y) (27) 


where ry, i = 1,.., N are the samples from p(p\y) drawn by 
the MCMC algorithm above and 


Pm(f,g\y) ~7% 


K AP 


V MAP 


(28) 


is the (Gaussian) posterior density of f,g when the hyper¬ 
parameters are fixed equal to Pi. The posterior means and 


variance are, respectively: p. 


MAP 
V 


■= (E 7) [/|y],E r) [5 r |,t/]) 


K v[f\y] = K v A T Y, v 1 y , E v \g\y\ = K v B t T, x y 


^MAP 


= K rl - K rl 


A T ' 

B t 




(29) 


Kn 


K ri O 
O K v 


and E r; is defined in ( |25| l. 

From ( [27] ) it follows that, in order to sample from the 
proposal density p(f,g\y) one can 

1) Sample pi ~ p(p\y) 

2) Sample (f,g) ~p Vi (f,g\y) in ((28) 


Evaluation of the stable posterior ps(f, g\y) 

The stable posterior in equation ( | 2 T| can be approximated 
as follows: 

ps{f,g\y) = Jps(f,g,v\y)dp 

= fp(y\f,g)ps{f,g\v)p(v)$$jdp (30) 

^ 1 p(y\f,g)ps(f,g\rn)p(Vi) 

— Np(y) Z^i =1 9(774) 

with pi ~ q(p). Note that the quantities p(y\f,g), Ps(f,g\v) 
and p(rj) can be evaluated. Thus, setting q(p) '■= p{p\y) and 












using the MCMC algorithm described above to obtain sam¬ 
ples from the posterior p(r]\y), the stable posterior ps(f, g\y) 
can then be approximated (up to the irrelevant normalization 
constant p(y)) from equation (|30|>. 


Algorithm 

We are now ready to provide the MCMC algorithm to 
sample from the stable posterior ps(f,g\y) 

Algorithm 2 (MCMC): 

Hyper-parameters MCMC: 

1) Initialization: set 77 o = rj using ( [26] ) 

2) For i > 0 Iterate: 

. Sample 77 from <3V,(-|?7i_i) ~ 7 # -1 )) 

• Sample u from a uniform distribution on [0,1] 

77 if u < 


Pru(y)p(,Vi) 


Set T]i = i ■' “ ~ “ p r ,._ 1 (y)p(ri i - 1 ) 

[ r]i-i otherwise 

3) After a burn-in period, keep the last N samples r]i 
which are (approximately) samples from p(t]\y). 

Predictor Impulse Responses MCMC: 

4) Initialization: compute [fo,go\ from 770 using 

5) For i = 1 to N do 
,map y, map 


compute n ' 
Sample (f',g) 
Compute a as 

a := min 


from A f(p; 


as in 

MAP 


1 MAP 


1 , 


ps[f\g'\y)p{f (k \g {k) \y) 

psU {k \ g (k) \y)p(f i g' \y) 


Set: (/««,«) = 


with ps{f,g\y) and p(f,g\y) approximated as in 
© and in ([271- 

Sample u from a uniform distribution on [0,1] 

(/ ,g ) if u < a 

otherwise 

6 ) The samples (f^\g^) obtained above are i.i.d. sam¬ 
ples from ps{f,g\y) as requested by Problem [4] The 
estimates of P(z) and H(z) can be obtained as: 

• Minimum Variance Estimate: from each sample 
(/ W ,5 W ) compute the impulse responses Pi(z) 
and Ili(z) in ([ 8 ]) and compute the averages 

N , N 

( 31 ) 


i= 1 i= 1 


We shall define p .— {Pk} k£[i,p]> h ■— 

the inverse 2 -transforms of P and H in ( |3T] >. 

Maximum a Posteriori Estimate 


f,g = argmaxp s (/, 5 (|y) (32) 

fi , 9 i 

In the remaining of the paper the model obtained by ([ 8 ]) 
using pTj ) and ( |32| ) will be called “MCMC posterior mean” 
model “MCMC MAP” model, respectively. Note that, from 
m an estimate of P{z) is obtained directly. This is to 
guarantee that P(z) is stable since the average ]>A P(z) of 
BIBO stable function is BIBO stable. On the other hand, if 
one average (0 the /W directly, there would be no guarantee 

5 Recall that the average of stable polynomial is not necessarily a stable 
polynomial unless the degree is smaller than 3, see [14]. 


that the average / would lead to a stable A(z) (and thus a 
stable model). Of course, if needed, an estimate of F can be 
obtained from 0 using P and H in pTj ) : 

G(z) := H~ 1 (z)P(z), F{z)-.= l-H~ 1 {z) 

VI. Simulations 

The performance of the techniques described the paper 
are compared by means of a Monte Carlo experiment, 
considering identification or marginally stable models, i.e., 
with poles close to the complex unit circle. At each Monte 
Carlo run a 2 n,, -order SISO ARMAX model, called M, is 
generated: 

A(z)y(t) = kz^ 1 B(z)u(t) + C(z)e(t) (33) 

The two complex conjugate roots of the monic polynomial 
A(z) are placed in 0.996 • exp(±j|), B{z) is a random 
polynomial whose roots are restricted to lie inside the circle 
of radius 0.9 and C(z) has randomly roots chosen in the 
interval [0.65, 0.73] so to ensure that the predictor impulse 
responses decay in no more then 30 steps. 

The system input u(t) and the disturbance noise e(t) are 
independent white noise with unit variance (for both identi¬ 
fication and test data sets). The constant k is designed so that 
the signal-to-noise ratio of ( |33j ) is one. More specifically, let 
y u {t) ■= B(z)/A(z)u(t) and y e (t) := C(z )/A(z)e(t), then 
k as been set to: k = y/var(y e )/var(y u ). A Monte Carlo 
study of 5000 runs is implemented. At each run a model as 
© is used to generate an identification set of 400 samples 
and a test set of 1000 samples. 

The predictor impulse responses / and g are estimated via 
the Bayesian System Identification described in [3] which is 
based on the Stable Spline Kernel as a priori covariance and 
the hyperparameters are determined as in 0. The predictor 
impulse responses are negligible for time lags larger than 
30 and thus the truncation length is chosen as p = 30. The 
variance of the noise o is computed via a low bias Least 
Square identification method. The estimators P and H in 
0 obtained from the Stable Spline estimators /, g ended 
up being unstable about 150 times out of 5000 Monte Carlo 
runs. In these cases the stabilization procedures described 
in these paper have been applied. Thus our Monte Carlo 
analysis is limited to these 150 data sets which resulted in 
unstable systems. 

The CVX toolbox , [15], which is based on YALMIP, was 
used in Matlab to solve the convex optimization problem 
©, with solver SeDuMi, [16]. Instead, the Matlab function 
fminsearch.m’ has been used to solve problem ( fl8[ . 

Notice that all these unstable models have been stabilized by 
our techniques. 

A. Performance results 

In order to illustrate the identification performances, we 
first consider dominant poles of the estimated, which are 
shown in Figure [2j the horizontal line in 0.996 indicates 
the absolute value of the “true” dominant poles. All the 
estimation methods, and in particular “ML+PF” and “MCMC 





Dominant Poles 



Fig. 2. Monte Carlo results. Boxplots of the absolute value of the dominant 
poles of the identified models. The horizontal line represents the absolute 
value of the dominant pole of the true model. 


posterior mean”, tend to place the poles close to the unit 
circle. 

In addition the estimated impulse responses are compared 
to the “true” ones in terms of relative errors on the estimated 
impulse responses: 


_ 1 Ibfc Pk ||2 1 II hk - h k || 2 

2 Ibfclh 2 \\h k \\ 2 


(34) 


where {p} and {ft} are the estimators of the tme impulsed 
responses {p} and {h}. 
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3. Monte Carlo results. Boxplots of the {errj}. 


Figure [3] reports the Boxplots of {err;} for the estimated 
models. The “MCMC posterior mean” estimator outperforms 
all the others significantly. The remaining three techniques 
yield rather poor quality properties in the identification of 
the system which is due to poor estimation of “dominant” 
modes. Indeed, a higher absolute value of the dominant pole 
corresponds to a slower decay rate of the impulse responses. 
When the estimators place a dominant pole very close to the 
unit circle, this results in a significant tail in the impulse 
response which, in turn yield a very high relative error. 


The algorithm “MCMC posterior mean” deserves a separate 
discussion. In this case, since the estimated P{z) is the 
average of all P's, the dominant pole of P is the slowest 
among the dominant poles of I 's. Yet, the effect of these 
dominant modes on the relative error is mitigated by the 
factor jj in the average m 

VII. Conclusion 

We have presented four different techniques to face the 
problem of identifying a stable system using a Bayesian 
framework based on the minimization of the predictor error. 
The experiment shows all methods ultimately produce stable 
models which perform comparably in terms of prediction 
error (not reported for reasons of space); however, only 
the model estimated with the so called “MCMC posterior 
mean” technique perform satisfactorily in terms of impulse 
response fit. In future work, we will discuss new techniques 
to overcome the problem in the identification performance 
without the usage of a MCMC. In particular, we are looking 
for new regularization which take in account penalty term 
both in the predictor and in the system impulse responses. 
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